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Abstract 

Hiis  paper  summarizes  initial  progress  in  develop¬ 
ing  a  computer  system  that  can  be  rapidly  programmed  to 
analyze  any  class  of  pictorial  scenes.  Scene  analysis 
programs  have  been  awkward  to  develop  using  conventional 
programming  systems  because  of  the  difficulty  of  formu¬ 
lating  pictorial  descriptions  in  symbolic  terms.  Pic¬ 
ture  processing  techniques  are  inherently  ad  hoc  and 
must  be  deduced  empirically  for  each  application. 

We  have  constructed  an  interactive  system  specifi¬ 
cally  designed  for  expressing  and  experimenting  with 
perceptual  strategies.  The  system  allows  an  experi¬ 
menter  to  describe  basic  perceptual  concepts  to  a  com¬ 
puter  in  terms  of  pictorial  examples.  The  examples  are 
designated  graphically  by  encircling  areas  of  a  dis¬ 
played  scene  with  a  cursor,  A  concept  is  represented 
internally  by  values  of  primitive  feature-extraction 
operators  that  distinguish  it  from  examples  of  previously 
defined  concepts.  Concepts  so  defined  constitute  a  com¬ 
mon  vocabulary,  shared  by  man  and  machine,  that  can  be 
used  symbolically  in  describing  objects  and  specifying 
scene  analysis  procedures. 

The  system  has  been  used  to  formulate  interactively 
descriptions  that  distinguish  objects  in  indoor  room 
scenes  and  programs  that  locate  these  objects  in  images. 


A  Description  of  ISIS 

ISIS,  Interactive  Scene  Interpretation  System,  is 
an  integrated  set  of  INTERLISP  functions  facilitating 
the  development  and  testing  of  pictorial  representations. 
The  system  Consists  primarily  of  an  image  file,  a  li¬ 
brary  of  primitive  feature  extraction  functions,  a  means 
for  applying  selected  primitive  operators  to  graphically 
designated  areas  of  a  scene,  a  symbolic  data  structure 
"for  accumulating  concept  definitions,  and  an  iconic 
data  structure"  for  retaining  pictorial  examples  of 
those  concepts.  A  detailed  description  of  the  system 

NAME 
CLASS 

PROPERTIES  IValuts  of  operator;  legal  for  this  class  of  nodas.S 
IS  IT?  !A  procedure  to  determine  whether  something  is  an 

example  of  sfcis  concept.) 

FIND  A!  fA  procedure  to  find  an  example  of  this  concept) 
EXAMPLES  !A  list  of  known  examples  of  this  concept) 

(a)  STRUCTURE  OF  A  GENERAL  NODE 

NAME  —  Tabletop 

CLASS  —  Object 

PROPERTIES  —  HUE:  Buff 

ORIENTATION:  Horizontal 

IS  IT?  Procedure  to  tast  whether  an  object  has  the 

distinguishing  features  of  Tabletop, 

FIND  A?  Procedure  to  find  a  Tabletop  in  a  given  picture. 

EXAMPLES  Picture  6,  Region  2 

(b)  EXAMPLE  OF  A  NODE 


FIGURE  1 


with  applications  is  available  from  the  authors  (SRI 
Artificial  Intelligence  Center,  Technical  Note  87). 

The  symbolic  data  structure  is  a  semantic  net  whose 
nodes  represent  prototype  patterns  for  objects,  attrib¬ 
utes,  and  relations.  Prototypes  are  defined  symboli¬ 
cally  In  terms  of  other  nodes  or  iconically  by  reference 
to  designated  regions  of  a  digitized  image.  The  system 
can  obtain  values  for  unspecified  properties  of  a  con¬ 
cept  by  applying  operators  to  Its  examples.  Figure 
l(a-c)  illustrates  the  semantic  net.  In  Figure  1(c), 
the  object  TABLETOP  and  the  attribute  BUFF  are  defined 
iconically,  while  the  abject  FLOOR  and  the  attribute 
HORIZONTAL  are  defined  symbolically  In  terms  of  other 
nodes.  FLOOR  is  distinguished  from  TABLETOP  hy  height. 

Iconic  regions  are  represented  as  an  explicit  list 
of  picture  samples  and/or  as  a  list  of  vertices  de¬ 
scribing  a  closed  polygonal  boundary.  Efficient  rou¬ 
tines  exist  for  determining  whether  a  given  picture 
element  is  contained  within  a  bounded  region,  for  obtain¬ 
ing  a  set  of  samples  over  a  bounded  region,  and  for 
fitting  a  boundary  around  a  set  of  samples. 

The  primitive  functions  and  symbolic  data  structure 
reside  within  the  interactive  environment  of  INTERLISP 
(formerly  BBN-LISP)  on  a  TOP-10.  The  raw  data  arrays, 
and  support  routines  for  graphics,  file  handling,  coor¬ 
dinate  transformations,  etc,,  reside  in  a  FORTRAN  en¬ 
vironment  accessible  as  a  lower  fork  through  the  TENEX 
operating  system. 

Images  are  stored  la  120  X  120  sample  arrays,  each 
sample  characterized  by  TV  brightness  through  red,  green, 
blue,  and  neutral  filters  and  by  a  range  measurement. 
(These  range  measurements  are  the  simulated  output  of  a 
time-of-f light  range  finder,  currently  under  development.) 
A  high  resolution  vector  display  and  an  MOS-ref reshed 
color  video  monitor  with  a  low  resolution  vector  overlay 
serve  as  graphic  output  devices.  Both  displays  allow 
users  to  select  points  on  the  screen  using  a  cursor. 


(c)  LINKAGE  OF  CONCEPTS  AND  EXAMPLES 
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THE  SEMANTIC  NET 
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The  Use  of  ISIS 


To  describe  concepts  in  terras  of  the  computer's 
primitive  functions,  a  human  must  know  (or  be  able  to 
determine  easily)  what  those  primitive  functions  can 
distinguish.  Our  system  provides  the  trainer  with  tools 
with  which  to  chose  primitive  functions  empirically. 

The  system  first  displays  a  digitized  scene  as  a  color 
image  or  as  a  thresholded  gradient.  This  display  quali¬ 
tatively  conveys  which  color  boundaries  are  easily  dis¬ 
criminated  by  the  computer.  Users  can  circle  regions  of 
the  displayed  image  and  obtain  from  the  system  either 
average  or  extreme  numerical  values  for  local  operators, 
such  as  height,  hue,  saturation,  and  surface  normal. 
Thus,  by  applying  operators  to  examples  and  counter¬ 
examples  of  pictorial  concepts  in  the  image,  the  trainer 
can  discover  directly  which  operators  provide  sufficient 
discrimination.  Users  can  try  out  a  proposed  descrip¬ 
tion  by  requesting  the  system  to  illuminate  all  parts  of 
the  displayed  scene  that  correspond  to  the  description. 

The  use  of  ISIS  to  formulate  pictorial  concepts  is 
best  illustrated  by  example.  The  following  examples  are 
based  on  the  simple  room  scene  shown  in  Figure  2. 
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FIGURE  2  DIGITIZED  ROOM  SCENE  VIEWED  ON  COLOR  MONITOR 


Description  of  Attributes 


Using  ISIS,  the  trainer  can  define  common  surface 
attributes,  such  as  the  color  BUFF  and  the  orientation 
HORIZONTAL,  with  which  to  describe  surfaces  symbolically. 
These  attribute  labels  can  usually  be  defined  by  a 
characteristic  range  of  values  for  a  single  primitive 
operator.  The  range  of  values  can  be  determined  empiri¬ 
cally  by  applying  the  operator  to  pictorial  examples  of 
the  attribute.  HORIZONTAL,  for  instance,  might  be  de¬ 
fined  as  any  surface  whose  normal  is  within  S  degrees  of 
the  Z  axis  based  on  values  of  the  ISIS  function  ORIENT 
applied  to  image  samples  on  the  FLOOR  and  TABLETOP. 


The  adequacy  of  a  proposed  description  can  be 
tested  by  requesting  the  system  to  intensify  points  in 
the  displayed  image  that  satisfy  a  proposed  description. 
This  test  is  accomplished  by  using  the  description  to 
filter  random  image  samples.  In  Figure  3,  a  set  of 
random  image  samples  is  shown  superimposed  on  a  gradient 
display  of  the  office  scene.  In  Figure  4,  only  points 
satisfying  the  suggested  definition  of  HORIZONTAL  are 
retained.  Figure  4  validates  the  definition  of  HORI¬ 
ZONTAL,  since  all  intensified  points  are  on  surfaces 
such  as  the  FLOOR,  TABLETOP,  and  CHAIRSEAT — normally 
thought  of  as  HORIZONTAL. 
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FIGURE  3  RANDOM  IMAGE  SAMPLES 
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FIGURE  4  HORIZONTAL  IMAGE  SAMPLES 
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Description  of  Objects 


Goals  for  Interactive  Scene  Analysis 


Defined  attributes  can  be  used  symbolically  to 
describe  objects.  For  example,  TABLETOP  might  now  be 
defined  to  the  system  as  a  HORIZONTAL  surface  distin¬ 
guished  by  height.  The  appropriate  height  range  was 
determined  by  applying  the  ISIS  function  HEIGHT  to  a 
few  manually  designated  TABLETOP  samples.  Figure  5 
confirms  that  height  and  orientation  adequately  dis¬ 
tinguish  TABLETOP  in  this  particular  scene. 

It  is  not  always  possible  to  specify  a  simple 
predicate  that  will  select  all  image  samples  belonging 
to  a  desired  object  and  to  no  others.  For  such  cases, 
the  repertoire  of  ISIS  primitives  described  in  this 
summary  can  be  augmented  with  special  purpose  feature 
extraction  routines.  Procedures  which  validate  edges, 
grow  regions  and  bound  rectangular  surfaces  have  been 
developed  by  several  users  and  are  now  generally  avail¬ 
able  on  ISIS.  Further  examples  of  the  use  of  ISIS  for 
characterizing  objects  can  be  found  in  Technical 
Note  87  and  in  a  paper  by  Garvey  and  Tenenbaum  (see 
On  the  Automatic  Generation  of  Programs  for  Locating 
Objects  in  Office  Scenes,  this  volume). 
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FIGURE  5  RANDOM  SAMPLES  WITH  HEIGHT  ANO  SURFACE 
ORIENTATION  OF  TABLE  TOP 


Ideally,  one  would  like  to  program  a  computer  to 
find  an  unfamiliar  object  as  one  would  instruct  a 
person,  by  providing  a  crude  description  of  the  desired 
object.  A  tree,  for  example,  might  be  described  as  a 
"green,  leafy  region  above  a  tall,  hrown,  vertical, 
bark-textured  region."  The  computer  could  demonstrate 
comprehension  by  outlining  instances  of  the  described 
object  in  a  displayed  image.  The  programmer  could  then 
refine  the  description  empirically  to  correct  errors  In 
the  computer's  interpretation. 

Communication  on  this  level  with  a  machine  is 
hampered  because  the  machine  does  not  share  an  "under¬ 
standing"  with  the  human  of  basic  descriptive  concepts 
like  "green,”  "leafy,"  "above,"  "vertical,"  and  "bark- 
textured."  Such  concepts  are  often  difficult  to  express 
in  natural  language,  and  even  more  so  in  conventional 
programming  languages.  Our  aim  has  therefore  been  to 
create  an  interactive  system  that  allows  an  experimenter 
to  describe  basic  perceptual  concepts  to  the  computer 
using  pictorial  examples.  The  examples  are  designated 
graphically  by  encircling  portions  of  a  displayed  scene 
with  a  cursor.  Given  a  pictorial  example,  a  representa¬ 
tion  can  be  empirically  constructed  by  trying  pictorial 
operators  in  order  of  increasing  complexity  until  the 
example  is  sufficiently  distinguished  from  previously 
determined  representations.  Pictorially  defined  con¬ 
cepts  constitute  a  shared  vocabulary  that  can  be  used  to 
describe  objects  or  to  define  more  complex  concepts. 

This  paradigm  is  intended  to  elevate  the  user  above  the 
details  of  hand-coded  algorithms,  allowing  him  instead 
to  concentrate  on  the  construction  of  descriptions  and 
strategies. 

Acknowledgments 

This  work  evolved  from  formulative  discussions  with 
Martin  Fischler,  a  strong  and  early  advocate  of  an  inter¬ 
active  approach  to  scene  analysis.  We  also  wish  to 
acknowledge  John  Gaschnig  who  helped  implement  an  earlier 
version  of  this  system. 

The  research  reported  herein  was  primarily  sponsored 
by  the  Office  of  Naval  Research  under  Contract  N00014- 

71- C-0294.  Additional  support  was  provided  by  the 
Advanced  Research  Projects  Agency  under  Contract  DAHC04- 

72- C-0008,  and  NASA  Contract  NASW-2086. 


3 


